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Evolutionary algorithm-based analysis of gravitational microlensing 
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ABSTRACT 

A new algorithm developed to perform autonomous fitting of gravitational microlensing 
lightcurves is presented. The new algorithm is conceptually simple, versatile and robust, and 
parallelises trivially; it combines features of extant evolutionary algorithms with some novel 
ones, and fares well on the problem of fitting binary-lens microlensing lightcurves, as well 
as on a number of other difficult optimisation problems. Success rates in excess of 90% 
are achieved when fitting synthetic though noisy binary-lens lightcurves, allowing no more 
than 20 minutes per fit on a desktop computer; this success rate is shown to compare very 
favourably with that of both a conventional (iterated simplex) algorithm, and a more state-of- 
the-art, artificial neural network-based approach. As such, this work provides proof of con- 
cept for the use of an evolutionary algorithm as the basis for real-time, autonomous modelling 
of microlensing events. Further work is required to investigate how the algorithm will fare 
when faced with more complex and realistic microlensing modelling problems; it is, however, 
argued here that the use of parallel computing platforms, such as inexpensive graphics pro- 
cessing units, should allow fitting times to be constrained to under an hour, even when dealing 
with complicated microlensing models. In any event, it is hoped that this work might stimulate 
some interest in evolutionary algorithms, and that the algorithm described here might prove 
useful for solving microlensing and/or more general model-fitting problems. 

Key words: gravitational lensing: micro - binaries: general - methods: numerical 
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1 INTRODUCTION 

Gravitational microlensing is an established technique for detect- 
ing exoplanets. When a massive foreground object (the lens, e.g. 
a planet and its host star) passes in front of a distant, background 
star (the source), the latter is magnified and displays a characteris- 
tic microlensing lightcurve. Since the lens is detected on account 
of its mass rather than its luminosity, faint planetary-mass objects 
can be detected by microlensing. Indeed, microlensing has the po- 
tential to yield the most representative statistical sample of Milky 
Way planets - unlike many complementary techniques used to de- 
tect exoplanets, it is in principle sensitive enough to detect even 
very distant, Earth- mass planetary objects l lBennett & Rhidfl996l ; 
IWambsganssI 1201 ll) . Unfortunately, microlensing events are ex- 
tremely rare, requiring a very precise alignment between observer, 
lens and source: as of luly 2012, of the several hundred known exo- 
planets, only around a doze n were discovered by microlensing (see 
IShvartzvald & Maozl2012l and references contained therein). Still, 
many of these detections have constituted import ant discoveries 
in the broader context of exoplanetary scien ce (e.g. lBeaulieu et all 
l2006tlGaudi et alj|2008l : lcassan et al]|2012l) - recently, it has even 
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been suggested that microlensing has facilitated the detection of 
a number of free -floating, plan etary mass objec ts in the Galaxy 
dSumi et al1l201il ; cf., however. lOuanz et alj2012l) . 

Very comprehensive models do exist for describing microlens- 
ing events and their corresponding lightcurves, though unfortu- 
nately it is notoriously difficult to use these models to interpret mi- 
crolensing events: amongst other complicating factors, the models 
tend to be highly nonlinear, and have enormous paramet er spaces 
that a r e often fraught w ith ambiguities and degeneracies dDominikl 
ll999l : IVermaa3l2007l) . Even the simplest possible microlensing 
model (viz. a point-like source star lensing an isolated m ass) poses 
some nontr ivial challenges to microlensing modellers dDominikl 
I2008tl2009l) . 

This paper presents a new metaheuristics algorithm which 
combines features of extant evolutionary algorithms (genetic algo- 
rithms; evolution strategies) with some novel ones, developed with 
a view to performing efficient and autonomous fitting of (especially 
binary-lens) microlensing lightcurves. The algorithm is, however, 
robust enough to solve general nonlinear optimisation problems, 
and its development was informed by tests carried out on a broad 
class of optimisation problems. 

Section 2 of the paper gives an overview of the binary-lens 
model used to test the performance of the new algorithm; Section 
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Figure 1. Geometry assumed in the SBLM. 



3 describes evolutionary algorithms in general, as well as the new 
algorithm which is the focal point of this work (Appendix A, at the 
very end of the paper, provides a more detailed look at the 'nuts 
and bolts' of the new algorithm); Section 4 focuses on the fitting 
experiments (and their results) used to assess the new algorithm; 
and Section 5 contains a commentary on the results presented in 
the preceding section. Section 6 concludes. 



2 OVERVIEW OF BINARY-LENS MODEL 

A small companion to a stellar lens can be detected via per- 
turbations it introduces to the lightcurve expected for an iso- 
lated lens. The resulting binary-lens lightcurve can exhibit a very 
wide variety of morphologies: it might be practically indistin- 
guishable from a point-lens lightcurve, or it might exhibit com- 
plex structure including significant asymmetry, multiple peaks, 
and spikes of high (formally infinite) magnification produced 
during so-called 'caustic crossing s' dMao & Paczvnskj Il99ll ; 
iNight. Pi Stefano & Schwamb1l2008l) . Models of such binary-lens 
events are particularly useful for characterising exoplanetary mi- 
crolensing events: even a system with one star and multiple plane- 
tary bodies can often be well-approximated either by ignoring mul- 
tiple planets, or by treating each planet plus its host star as an in- 
depe ndent binary system, provided source magnification is not too 
high ( iGaudi. Naber & Sackettll 19981) . 

The model introduced here features 7 basic parameters that de- 
scribe rectilinear motion of an unblended, point-like source across a 
static binary lens. This simple binary-lens model (hereafter SBLM) 
neglects some higher-order effects that need to be taken into ac- 
count when carrying out in-depth modelling of binary-lens events: 
nevertheless, the model is far from trivial, and is useful enough to 
provide first-order fits to many binary-lens events. 

For example, even though the assumption of a point-like 
sourc e will break down in a large fraction of real events JPominikl 
l2008h . the regions of lightcurves affected by finite-source effects 
tend to be localised - usually affecting the high-magnification 
peaks associated with e.g. caustic crossings - so even when such 



effects are present, good SBLM-based fits can often be obtained 
simply by excluding t he relevant regions of lightcurves from the 
fitting i Vermaakl2007l) . It should be emphasised, however, that this 
work does not attempt to advocate the SBLM for use in modelling 
real microlensing events: the model is adopted here primarily to 
provide a relatively straightforward platform for benchmarking dif- 
ferent algorithms studied later in this work (see Section[4]l. 

2.1 Model parameters 

A useful scale for characterising a lensing event is the so-called 
Einstein radius or Einstein angle of the primary lens, 8e, defined 
as: 

where G is the gravitational constant, M is the mass of the primary 
lens, c is the vacuum speed of light, Dl is the distance between 
observer and the lens, and Ds is the distance between observer 
and the source. The Einstein radius is the angular radius of the ring 
that would be formed in the case of perfect source-lens-observer 
alignment; for typical Bulge microlensing events, Be ~ 1 mas. 

For convenience, complex notation is used here to describe 
the lensing event. The origin of the coordinate axes is placed at the 
projected position of the primary lens; with no loss of generality 
the secondary lens is placed on the real (+x) axis; and f € C is 
used to denote the position of the source on the sky. The SBLM's 
seven parameters, and their physically-permissible ranges, then, are 
as follows (refer also to Fig.QJ: 

(i) a, the projected angular orbital separation, in units of 8e, 
between the primary and secondary lenses (0 < a < oo); 

(ii) b, the length, in units of 8e, of the projected source-lens 
impact vector b (0 < b < oo); 

(iii) mo, the unlensed magnitude of the source; 

(iv) q, the ratio of the mass of the secondary lens to that of the 
primary lens (0 < q < 1); 

(v) 9, the angle formed between the +x axis and the projected 
impact vector b (0 < 8 < 2n); 

(vi) tE, the Einstein radius crossing time, i.e. the time it takes 
the source to travel an angular distance of Be', and 

(vii) t m , the time of closest projected approach of the source to 
the primary lens. 

Note that to obtain physical angular distances, the parameters a and 
b must be scaled by Be- 

Many ot her parametrizations of binary lensing even ts are pos- 
sible (see e.g. iMao & Pi Stefandfl995l : IPominikll 19991) . although 
most are qualitatively similar to the one used here. In all subse- 
quent discussions, the seven SBLM parameters will be assumed to 
be constrained to the ranges given in Table Q] These ra nges were 
chosen to fa cilitate direct comparison with the results of IVermaakl 
j2003ll2007l) : they cover most of the lensing zone, and thus the ge- 
ometries with the highest likelihood of detecting secondary lenses. 

2.2 Calculation of binary-lens lightcurves 

One of the primary difficulties associated with the SBLM is that 
it does not provide an explicit expression for the amplification of 
the source as a function of time and lensing parameters. Instead, 
the time- varying source position relative to the lens, as well as the 
lens geometry, is first used to calculate the position of the images of 
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Table 1. Allowed parameter ranges for all fits and simulated lightcurves. 



Parameter 


Units 


Minimum 


Maximum 


a 




0.6 


1.7 


b 




0.001 


1 


mo 


mag 


18 


22 


1 




0.1 


1 





rad 





2- 


£e 


d 


5 


50 


tm 


d 


-20 


20 



the source that are formed by the lens; the light from each of these 
images is then summed to calculate the total amplification. 

Assuming rectilinear motion, straightforward geometric con- 
siderations yield the source position as a function of time, and two 
of the lensing parameters: 



(" = t sin 8 + b cos 8 + i (— r cos 8 + fesin ( 
where r is a dimensionless time parameter: 



t E 



(2) 



(3) 



Assuming zero external gr avitational shear and a binary lens, the 
general le ns equation (see iBourassa. Kantowski & Norton! 1 19731 ; 
IWitjl99fj|) reduces to 



z a — z 



(4) 



which, with C, known, is an implicit expression for the image posi- 
tions, z £ C. It may be shown that this special case of the lensing 
equation has always either n = 3 or n = 5 solutions, correspond- 
ing to 3 or 5 lensed ima ges. Following some nontrivial algebraic 
manipulation Jwlttlll990h . Eqn. 2]can be converted into a complex 
quintic polynomial equation of the form 

f(z;(,a,q) = Y, 5 c j (C,a,q)z i =0, (5) 
z — 'j=0 

where Cj £ C; the roots of this polynomial can be found using any 
one of a number of standard numerical techniques, for example the 
Jenkins-Traub algorithm or Laguerre's algorithm (in practice it is 
more convenient to solve this polynomial equation than trying di- 
rectly to find all the solutions to Eqn.H). Two of the five polynomial 
roots may not be true solutions to the lensing equation itself, but 
these spurious solutions may readily be eliminated by direct substi- 
tution into Eqn. [4] In an alternative though equivalent formulation, 
the lensin g equation c an be cast into the form of a real polynomial 
equation (Asada 2002). 

Once the image positions Zi have been solved for, the amplifi- 
cation due to the i th lensed image is calculated via 



Ai 



dot 1 Ji 1 



1 - 



(a — Zi) 



(6) 



where J, is the Jacobian of Eqn. [4] for the case of the i th image. 
Finally the amplification due to the n individual images is summed, 
so that the lensed magnitude m is given by: 



,(*0 



-2.5 ■ loe 



(EL 



A 



+ m ; 



(7) 



the superscript fc = l,2,...,JVis introduced to emphasise that the 
preceding calculations culminate in a single point on an TV-point 
lightcurve. 



2.3 Difficulties associated with fitting binary lightcurves 

Given a set of parameters, obtaining the lightcurve predicted by the 
SBLM is a nontrivial though relatively straightforward task. Un- 
fortunately, however, the associated inverse problem - i.e., given a 
lightcurve, using the SBLM to determine the physical parameters 
associated with the lensing event - is extremely difficult; yet it is of 
course the solution to this inverse problem that is of value insofar 
as the interpretation of real-world microlensing data is concerned. 
Some of the difficulties associated with this fitting problem include 
the following. 

(i) Computational complexity. Generating even a single iV-point 
lightcurve, i.e. solving the forward problem, requires, as a mini- 
mum, the roots of N different quintic polynomials to be found nu- 
merical ly (for techniques on solving quintic polynomial equations , 
see e.g. lKing|l993 : lRalston & Rabinowitj200l(;|Press et al.l2007l) . 

(ii) Volume of parameter space. The SBLM has a 7-dimensional 
parameter space, and the physically-realistic ranges for some of the 
parameters (e.g. imp act parameter, le ns mass ratio) can span many 
orders of mag nitude. IVermaakl d2003h estimated that an exhaustive 
grid search of the parameter space would take on the order sev- 
eral of years to complete (assuming a 2% error tolerance on all 
parameters, the very conservative parameter ranges in TableQ] and 
~ 2 ms/lightcurve calculation). Any extensions to the SBLM only 
exacerbate this problem. 

(iii) Nonlinearity. The mapping from SBLM parameter space to 
source amplification is highly nonlinear - consequently, nearly- 
identical parameter sets can give rise to dramatically different 
lightcurves. A typical regression surface will contain a large num- 
ber of local optima, will be non-smooth, and will contain no clue 
as to where the global optima (with respect to some, e.g. x 2 -based, 
metric) are to be found; further more the wells of convergenc e 
around optima tend to be small jVermaakll2003l ; lBennettll2O10h . 
and when dealing with noisy data, true solutions need not even 
correspond to a globally optimal solutions. Using biased param- 
eter estimators (e.g. using a maximum likelihood estimator, rather 
than, say, a posterior-mode estimator) can also lead to globally op- 
timal solutions that are, potentially, nowhere near the true solution 
(Dominik 2008; see also Sectionf4]of this paper). 

(iv) Degeneracy. The aforesaid problem of similar parameters 
leading to very different lightcurves can be overcome by making 
a denser sampling of the parameter space; more challenging, how- 
ever, is the problem that a number of very different para meter sets 
can g ive rise to virtually indistinguishable lightcurves dDominikl 

In practice, such degeneracies can either be resolved by us- 
ing non-photometric (e.g. spectroscopic) data to reduce the effec- 
tive parameter space, or they can be avoi ded with high quality pho- 
tometry and d ense lightcurve sampling jMao & Di Stefanolll995l : 
lBennettll2008l) Fl 

The impasse, then, is that there is little hope of finding the 'correct' 
set of parameters to describe a microlensing event if one does not 
make a dense and extensive search of the parameter space, but the 
computational burden of doing so is often crippling. A thorough 
search is especially important because even if one does happen to 
find one set of parameters that seems to provide a very good de- 
scription of the event in question, there may exist many other pa- 
rameter sets that provide equally good or better descriptions of the 



Some of the degeneracies identified bv lDominilJ ii999) will not even 
be resolved with any curren tly-achievable data qualities and sampling rates 
I Shvartzvald & Maoz 2012). 
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Figure 2. Schematic to illustrate the workings of the canonical genetic algorithm, where trial solutions are encoded as binary strings. Each bit represents 
a gene; here, lighter shades are used to represent solutions determined to be fitter, according to any problem-specific metric. The 'initial population (first 
array) contains a range of solutions, some good (lighter shades) and some bad (darker shades). During ranking, the solutions are sorted from best to worst 
(second array). The genes from fitter solutions are then favoured during recombination, which is apparent from the fact that the new solutions (third array) 
are assembled mainly from components of the original lighter-coloured solutions; finally, random mutations - which may or may not be beneficial - are 
introduced (fourth array), and the whole process repeats. The fine-grained details of many EAs differ from the simple algorithm illustrated here; however, most 
EAs share the same underlying concept of a population of trial solutions evolving, under the action of a few simple evolutionary operators, towards optimality 
(the constitution of the or an 'optimal solution will depend on the problem at hand). 



event. A good search algorithm should, therefore, be able to locate, 
in a reasonable amount of time, all potential solutions; actually 
choosing between competing model parameter sets is a different 
problem altogether dBumham & Andersonll2002l) . 

2.4 A simple noise model 

A simple model of photometric noise was developed, during the 
course of this work, based on fits to 1000 microlensing events ob- 
served during th e 201 1 campaig n of the Optical Gravitational Lens- 
ing Experiment (Udalski 2005); the full dataset comprised approx- 
imately 1.5 million /-band magnitudes, along with estimated pho- 
tometric errors, seeing estimations and sky levels0The best-fitting 
model for photometric errors as a function of J-band magnitude 
(based on data in the range 14 < mf < 22) thus obtained was 

log 10 crf ' = 0.3416 ■ mf - 7.7095, (8) 

(k) 

where a) is to be interpreted as the standard deviation of a Gaus- 
sian distribution of observed magnitudes around some unobserved 
true magnitude rri j . 

This model seems overly optimistic for bright sources (i.e. 

(k) 

rrij ~ 14, where the model predicts photometric errors on the 
order of 0.1 %!), and even th e assumption of Gaussian noise is prob- 
ably naive dDominikf 2008) ; still, the model does at least provide a 
means of making the simulated lightcurves used for fitting exper- 
iments (Section [4} somewhat more realistic and more challenging 
to fit. 



3 EVOLUTIONARY ALGORITHMS 
3.1 Overview 

Evolutionary algorithms (hereafter EAs), are metaheuristic optimi- 
sation algorithms which tend to yield good results on a wide range 

2 The data are available online at http://ogle.astrouw.edu.pl 



of (even extremely difficult) mathematical optimisation problems. 
EAs draw inspiration from evolutionary biology - especially pop- 
ulation genetics - and they incorporate, in a computational setting, 
notions such as natural selection/survival of the fittest, genetic re- 
combination, inheritance, and mutation. The first E A-based mathe - 
matical optimiser was proposed in the mid-1970s ( lHolland|[l975h . 
and since then, many modifications and improvements to the basic 
algorithm have been develo ped, including mechan isms without any 
direct biological analogues faaupt & Haupll2004h . 

So-called genetic algorithms (GAs) form one of the most 
successful subsets, and certainly the best-known subset, of evolu- 
tionary algorithms; evolution strategies (ESs), developed indepen- 
dently from (though more or less concurrently with) GAs, form 
another well-known subset. In spite of the rich variety of their po- 
tential incarnations, most EAs share a basic working scheme. 

As an example, consider a typical GA. The algorithm starts 
with a large, randomly-generated population of candidate solutions 
(called individuals or phenotypes) to the optimisation problem at 
hand, and associates with each solution an encoded version of the 
phenotype (called a chromosome, genotype or an individual's ge- 
netic material), as well as a problem-specific measure of the so- 
lution's quality (fitness). Then, by repeated application of 'genetic 
operators' mainly at the genotypic level, the algorithm causes the 
population as a whole to increase in phenotypic fitness - that is, 
solutions evolve towards optimality. 

A typical (though simplistic and by no means general or opti- 
mal) working scheme for a GA is as follows: 

(i) construct a random initial population of genotypes; 

(ii) decode the genotypes and evaluate their phenotypic fitness; 
if the fittest phenotype matches the user-defined target fitness (or 
other termination criterion), terminate, otherwise continue; 

(iii) produce offspring by stochastic selection and recombina- 
tion of genetic material in the current population, favouring the 
genes of individuals with high phenotypic fitness; 

(iv) introduce, with some low probability, random changes 
(copying errors) into the genetic material of the offspring; 
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(v) replace low-fitness members of the old population with the 
offspring created in the previous step, and return to step (ii). 

This scheme is illustrated in Fig. [2] The selective recombination 
of the population's genetic material exploits information associ- 
ated with good solutions to build even better ones, and the random 
mutations serve to inject entirely new and potentially favourable 
material into the gene pool that could not be obtained simply by 
recombining the genetic material of existing individuals. Such an 
evolutionary scheme has a number of features which distinguish it 
from random heuristics, albeit that it might bear superficial res em- 
blance to e.g. a standard Monte Carlo approach dGregorvll2005l) . 

It may be shown that, subject to a few reasonable assumptions, 
the canonical GA will al ways converge to the global optima in the 
searc h space in question feiben. Aarts & Heelll99ll ; lMichalewiczl 
1 19961) : moreover, it is also possible, though usually not straightfor- 
ward, to estimate the rates a t which solutions are likely to be lo- 
cated on given problems (e.g. lHollandll975l ;lTh" terens & Goldberg! 
1 1994) . 

ESs are similar to GAs in many respects, although usually they 
avoid solution encoding or discretisation, evolve the algorithm's 
control parameters in tandem with th e trial solutio ns, and/or use 
more sophisticated mutation schemes (Kramer 2010). 

It should be noted that in many problems - microlensing mod- 
elling included - one usually cannot hope to arrive at a unique 
or clear-cut, globally-optimal solution, and one must instead try 
to sample all possible optima in a multimodal parameter space. 
Even if an EA is not specifically set up to search for multiple solu- 
tions, mutation operators ensure exploration of the entire parameter 
space (provided the evolutionary sequence is sufficiently long) - as 
such, given a completed evolutionary sequence, multiple possible 
solutions can be obtained simply by extracting all explored phe- 
notypes that meet some fitness criterion (as opposed to extracting 
only the fittest phenotype in the final population state, which should 
correspond to a global optimum). Moreover, if one knows before- 
hand that one is dealing with multimodal search spaces, simple 
modifications can be made to make EAs more efficient at search- 
ing for multiple solutions. A simple approach is to begin a new 
evolutionary sequence whenever the population starts to stagnate 
around an optimum; alternatively, multiple independent popula- 
tions can be evolved concurrently. A more sophisticated approach 
might be to dedicate a fraction of the evolutionary population(s) to 
global exploration of the search space, and the remaining fraction 
purely to the exploit ation of promising solutions already discovered 
jTsutsuiet ail 19971) . 



3.2 Advantages of evolutionary algorithms 

EAs in general offer a number of benefits that suggest their suitabil- 
ity for dealing with the challenges discussed in Section [23l some 
of these are outlined below. 

(i) Robustness. EA-based optimisers can handle problems where 
the spaces to be searched for optima are multimodal, have lo w- 
contrast or have a very high dimensionality dCharbonneaull 19951) . 

(ii) Simplicity. In order to solve a given optimisation problem, 
most 'off-the-shelf EAs require only a single, unambiguous mea- 
sure of the quality (fitness) of candidate solutions. They do not 
require, for example, gradients or Hessian matrices, the computa- 
tion of which might be prohibitively difficult or impossible in some 
problems. 

(iii) Speed. Apart from the intrinsically high speed with which 



EAs tend to explore large parameter spaces, they are embarrass- 
ingly parallel: that is, very little effort is required to transform a 
serial implementation into a parallel implementation. Thus they are 
well-suited to exploiting high-performance hardware such as multi- 
core workstations, graphics processing units (GPUs), clusters, etc.: 
see Section [5T2l To be sure, canonical EAs (GAs, ESs , etc.) do have 
a reputation for being in efficient local optimisers l lCharbonnead 
1 19951 : lMichalewiczlll996h . but this problem can readily be miti- 
gated either by coupling an EA to a dedicated local optimiser - e.g. 
a gradient-based method - or, as suggested in Section [331 below. by 
making simple tweaks to an EA's underlying mechanisms, so as to 
endow it with local optimisation capabilities. 

(iv) Versatility. A single EA-based optimiser can be expected 
to yield 'good enough' results on a very wide class of problems 
- from a problem as simple as fitting a three-parameter Gaus- 
sian to some data, to one as complex as choosing a molecular 
configuration t o minimize a Buckingham potential with hundreds 
of parameters l lWehrens & Buvdenslll998l ) - and it is easy to in- 
corporate problem-sp ecific knowledge into an EA-based solver 
faaupt & Haupll2004) . 

The widespread adoption of EAs in fields such as engineer- 
ing, chemistry, biology, and economics bears testimony to their 
many merits, and though their uptake in the physical sciences has 
been somewhat slower (at least partly because the theoretical un- 
derstanding of their workings is still quite limited), recently they 
have already been used wit h great success in many branches of as- 
tronomy and astrophysics jRajpaulll20l J). In fact, a classical ge- 
netic algorithm - PIKAIA - has already been tested in the context 
of microlensing modelling l lKubasll2005l) . although it has not seen 
much use in recent years (at least, not in the microlensing commu- 
nity); unlike PIKAIA, though, the new algorithm presented here is 
tailored for high performance numerical optimisation, rather than 
for more general and pedagogical purposes. 

3.3 A new algorithm for fitting microlensing events 

The algorithm developed by the author for fitting microlensing 
lightcurves combines features of extant evolutionary algorithms 
(GSs, ESs) with some novel ones. The algorithm's main features 
are the following (a more detailed examination of the 'nuts and 
bolts' of the algorithm is presented in Appendix A): 

(i) a real-valued representation of solutions is used (i.e. parame- 
ter domains are not discretised, as would've been the case with e.g. 
a binary-coded GA); 

(ii) fitness-ranking can, in general, be performed according to 
arbitrary Bayesian priors and likelihood functions; 

(iii) individuals are chosen for reproduction via a vari- 
ant of the well-known tournament selection mechanism (e.g. 
iMiller & Goldberdl 19951) ; 

(iv) reproductive pairings comprise exactly two parent solu- 
tions, and give rise to exactly two offspring; 

(v) the offspring are constructed in one of three ways - as ran- 
dom interpolants of their parents, as exact duplicates of their par- 
ents, or as solutions where each parameter is drawn randomly but 
without modification from one of the two parents; 

(vi) both coarse-grained ('jump') and fine-grained ('creep') mu- 
tation operators are used, facilitating respectively global and local 
optimisation, with the mutation rates being dynamically adjusted in 
response to changes in population fitness; and 

(vii) the evolutionary sequence is restarted whenever the popu- 
lation stagnates around an optimum. 
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The algorithm runs autonomously in the sense that, given a set 
of data and a general model (e.g. the SBLM) to be fit to the data, the 
algorithm will search intelligent^ through the model parameter 
space to find solutions compatible with the data, without requiring 
any intervention from or assistance on the part of the user. 

The algorithm's development was informed by tests car- 
ried out not only on single- and binary-lens fitting problems, but 
also on a range of more general nonline ar optimisation problems 
jMore et al.|[l98ll ; lHaupt & Hauptll2004l) . The use of the dual mu- 
tation operators and dynamically-adjusted mutation rates means 
that, unlike more traditional EAs, the algorithm fares well at 
both global exploration and local optimisation. The jump oper- 
ator allows for large jumps through the entire parameter space, 
whereas the creep operator is more conservative, and introduces 
only small (log-uniformly distributed) perturbations to existing so- 
lutions. The hybridised operators used for offspring construction 
are also more compatible with local fine-tuning than the canonical 
genetic crossover ope rator, which is well know n to be antithetical 
to local optimisation (Eiben & S chipperslll998l ). Finally, the algo- 
rithm was designed in an array programming paradigm (wherein 
all operators are applied to a single population matrix rather than 
to individual solutions) and, as such, the main algorithm is able to 
comprise fewer than 100 lines of code when implemented in high- 
level language like MATLAB or Python. 



4 FITTING EXPERIMENTS AND RESULTS 

To test the EA, it was pitted against other optimisation algorithms 
and given the task of fitting several hundred simulated binary-lens 
lightcurves, each containing between one and four peaks (SBLM 
lightcurves with more than four peaks are theoretically possible 
though, assuming the parameter ranges in Table Q] very rare.) The 
primary aim of the experiments was to study the feasibility of using 
the EA to explore binary model parameter spaces, and to quantify 
the algorithm's relative search efficiency/fitting success. 



4.1 Algorithms compared in tests 

The following four algorithms are compared in the tests that follow. 

(i) Grid-search algorithm. A simple brute-force approach, this 
algorithm evaluates systematically a 'grid' of points, i.e. a prede- 
fined, uniformly-sampled subset of the model parameter space. The 
grid can be made arbitrarily fine (so that optimal solutions can be 
located with arbitrary accuracy) but computation time, which in- 
creases linearly with the total number of gridpoints, places a practi- 
cal limit on the number of gridpoints that can be tested. In the fitting 
experiments, each dimension of the parameter space was discre- 
tised into the same number of linearly-spaced gridpoints, with the 
total number of gridpoints constrained by a predetermined limit on 
the fitting time. For example, if 10 6 trial solutions could be eval- 
uated within the time limit, and there were 7 parameters to be fit 
(as is the case for the SBLM), In 10 6 / In 7 ~ 7 different values of 
each parameter would've been tested. 

For reasons outlined in Section [23] such an algorithm should 



3 The algorithm's 'intelligence' stems from its ability to exploit (via e.g. 
the reproduction and fine-grained mutation operators) structure in the search 
space to guide its search, as well its ability to adapt its own control param- 
eters in response to the progress of the search. 



not be expected to be suitable for efficient modelling of microlens- 
ing events; still, it is included in these tests as a foil for the more 
sophisticated algorithms. Moreover, grid search algorithms have 
found some (a dmittedly limited) use in the context of m icrolensing 
modelling (e. g. lKubasl20ol ; lBennettl20 1 d ; lGaudil20 id) , which un- 
derscores the paucity of search/optimisation algorithms, other than 
brute-force approaches, capable of reliably making headway on mi- 
crolensing modelling problems. 

(ii) Iterated simplex algorithm. The 'downhill simplex method', 
also known as the 'amoeba algorithm' or the 'Nelder-Mead 
method', is an exceptionally popul ar nonlinear optimisation tech- 
nique, due to lNelder & Meadl i 1 9651) . The downhill simplex method 
is used primarily for local optimisation, although it is endowed with 
some global optimisation capabilities, and like many other heuris- 
tic algorithms - including evolutionary algorithms - its operation 
is based purely on the evaluation of a fitness/objective function; it 
does not, for example, depend on numerical or analytical gradients 
of such a function. 

Given an optimisation function in n free variables, the method 
starts by computing the values of the function at n + 1 different 
points, which form the vertices of a general simplex. Then, by ex- 
trapolating the behavior of the function from the measurements 
made at each test point, the algorithm rapidly adapts the simplex 
to the local landscape, in such a way that the simplex 'explores' its 
way downhill, until finally contracting onto a local minimum. 

In the microlensin g fitting exper i ments , the modern downhill 
simplex a lgorithm of Lagarias et al.l (1998,) was used. As demon- 
strated bv lCharbonneaul j2002[) , the use of multiple, short simplex 
runs (hereafter an 'iterated simplex' approach), rather than one very 
long simplex run, greatly enhances performance on difficult global 
optimisation problems: accordingly, in the microlensing fitting ex- 
periments, a number of random starting points were chosen in the 
search space, and at each point, a new downhill simplex optimiser 
was deployed. An individual simplex run was terminated when con- 
vergence to a local optimum was achieved, or when 10 4 iterations 
had been reached. As with the grid search approach, the total num- 
ber of starting points considered was limited only by the available 
computational time. 

Given the popularity, speed, and modus operandi of the amoeba 
method, the iterated simplex approach constitutes a very reasonable 
competitor for an evolutionary algorithm. 

(iii) Evolutionary algorithm. This is the newly-developed algo- 
rithm outlined in Section [373l Default algorithm parameters, as dis- 
cussed in Appendix A (viz. a fixed population size of A'pop = 1000, 
a starting jump mutation probability of p mut = 1%, etc.), were used 
in the fitting experiments. 

Although the algorithm was designed to have both global explo- 
ration and local optimisation capabilities, its fits can very easily 
be fine-tuned by a dedicated local optimiser such as the amoeba 
method described above. Therefore, the performance of the algo- 
rithm both with and without the benefit of amoeba fine-tuning was 
studied; in the former case, the amoeba method was used to fine- 
tune only one (viz. the fittest) solution returned by the evolutionary 
algorithm. 

(iv) Artificial neural network (ANN). Artificial Neural Net- 
works' (ANNs, sometimes referred to simply as 'neural networks') 
are mathematical/computational models that are inspired by bio- 
logical neural networks. In an ANN, simple computational pro- 
cessing elements ('artificial neurons') are joined together to form 
a complex network of processing nodes, mimicking the structural 
and functional aspects of biological neural networks. ANNs are 
popular in the fields of regression, classification, pattern recog- 
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nition, and decision-making; possibly the most attractive feature 
of ANNs, however, is their ability to be used as arbitrary func- 
tion approximation mechanis ms that can 'leam' from observed data 
dMichalewicz & FogefeOOd) . In other words, they can perform au- 
tonomous, black-box modelling of the unknown (and possibly very 
complex) functional relationship between a set of input and out- 
put vectors, e.g. the relationship between a set of microlensing 
lightcurves, and their underlying model parameters. 

The ANN considered in the fitting ex periments t hat fo l low is 
a sophisticated algorithm, developed by IVermaakl J2003l . 1 20071) 
specifically for the purpose of fitting SBLM lightcurves. Unlike the 
other algorithms discussed here, Vermaak's ANN-based fitter does 
not perform any iterative search through parameter space; instead, 
the mapping from lightcurves to model parameters is approximated 
directly by a complicated function derived from a very large set of 
training lightcurves. Although the ANN does require several hours 
to train, once-trained, fits can be generated very rapidly. 

In many senses, the philosophies underpinning the ANN-based 
and EA-based fitting approaches are antithetical: the neural net- 
work is highly optimised for the fitting of SBLM lightcurves - in- 
deed, it is a truly bespoke algorithm - but, unless it is retrained 
from scratch, it can only fit SBLM lightcurves, and it certainly can- 
not handle more general optimisation problems. As with the the 
EA, however, the ANN's output can readily be fine-tuned by a ded- 
icated local-optimiser; accordingly, the ANN's performance both 
with and without the benefit of amoeba optimisation is considered 
in the fitting experiments that follow. 

The fits performed by the grid search, the iterated simplex 
and the evolutionary algorithm were all guided by the minimi- 
sation of a x 2 -statistic. It is well known that such a maximum- 
likeli hood approach leads to biased parameter estimates dDominikl 
120081) ; however, as already noted, the EA is fully-equipped for deal- 
ing with arbitrary Bayesian priors and likelihood functions. In fact, 
the assumptions of flat priors and Gaussian noise conveniently ren- 
ders the more general fitting problem eq uivalent to that of a x 2 - 
minimisation problem JPress et al]|2007l) . though in principle the 
EA could quite easily be used to obtain e.g. maximum a-posteriori 
parameter estimates instead. 

Although the ANN-based fitting did not involve the direct 
computation of any chi-square statistic (or indeed any goodness- 
of-fit statistic) during fitting, its training was based on a chi-square 
metric, so in the context of these fitting experiments, the ANN 
might be thought of as an indirect means of attempting a chi-square 
minimisation. 

It should also be noted that none of the algorithms tested here 
explicitly incorporated any information specific to the microlensing 
modelling problem; this is discussed in more detail in Section [54l 

4.2 Test setup 

Following Vermaak (2003|), the SBLM with parameter ranges as in 
Table Q] was used to generate lightcurves comprising 100 magnifi- 
cation points, starting at a random time i s tart, where 

- 3 < 4tart ~ tm < -2, (9) 

tE 

and ending at a random time tend, where 

2 < * end ~ tro < 3. (10) 

IE 

The use of synthetic data (where input parameters are known 
exactly) meant that fitting success could be gauged not only in 



terms of goodness-of-fit, but more importantly, also in terms of 
adequate sampling of the 'correct' regions of parameter space. Ac- 
cordingly, a fitting sequence was deemed successful if it returned at 
least one model satisfying A\ 2 /v < 1, with errors in all fitted pa- 
rameters of less than 10 percent; a weaker criterion of A\ 2 jv < 1 
only, corresponding to a good fit(s), but not necessarily to closeness 
in parameter space, was also considered (here A\ 2 '■= X 2 — Xtme> 
where \ 2 is the chi-square goodness of fit statistic for the lightcurve 
associated with a given solution/model, Xtruc is the same statistic 
for the input lightcurve, and v is the number of degrees of freedom 
for the fit). 

The fitting experiments were carried out on a modem work- 
station with a six-core, 12-threaded, 3.6 GHz processor, and 6 GB 
of 1333 MHz ECC RAM. To level the playing fields, all algorithms 
except for the ANN were parallelised, and restricted to 20 minutes 
per fit (the ANN generated fits far more rapidly, and in any case, 
would not have benefited from a longer run time). 

In the case of the grid search algorithm, parallelisation en- 
tailed using all the available processor threads to evaluate twelve 
grid-points concurrently; in the case of the iterated simplex al- 
gorithm, twelve downhill-simplex runs were carried out in paral- 
lel. Parallelisation of the EA was equally trivial: each generation, 
twelve candidate solutions from the evolutionary population were 
evaluated concurrently. Since the computational expense of evalu- 
ating candidate solutions (mapping parameters to lightcurves and 
comparing the resultant lightcurves to observed/simulated data) 
dwarfed the expense of applying evolutionary operators (selec- 
tion, reproduction, mutation, etc.) to the population, extremely effi- 
cient parallelisation was achieved by parallelising only one or two 
lines of the EA's code, i.e. those that called external, microlensing- 
related functions. The parallelisation of EAs is discussed in more 
detail in Section [53l 

4.3 Results: noise-free lightcurves 

In the first test, lightcurves were uniformly-sampled and noise- 
free; again for c onsistency with the formalism and experiments of 
IVermaakl fe003h . a value of a (k) = 0.01 was adopteqj for the pur- 
poses of calculating A\ 2 /v. Although this A\ 2 jv < 1 criterion 
has little statistical value in its own right, it does at least provide 
a means to quantify how closely a fitted lightcurve matches an in- 
put one (since, in the noise-free case, A\ 2 is proportional to the 
mean squared error for the fit), and thereby to rank the different 
algorithms. 

Results from this test are presented in Table[2] 
Sans the benefit of the amoeba-based local optimisation, the 
accuracy of both the grid search and the ANN-based fits was un- 
acceptably poor (success rate close to 0%). The addition of the lo- 
cal optimisation improved the algorithms considerably, and when 
viewed in terms of success rates (as high as 70% for the ANN) the 
results seem quite respectable; still, for critical work, a failure rate 
of around 30% is far from ideal. 

Far more impressive was the EA which, coupled with the 
amoeba method, was easily the most accurate of all the algorithms: 
it failed to yield a very accurate fitted model for fewer than 4% of 
the lightcurves. Even without the additional amoeba-based optimi- 
sation, the EA performed extremely well in its own right, which 
bears testimony to the EA's strength as both a global and a local 

4 Roughly speaking, this scaling - though otherwise arbitrary - meant that 
fits satisfying the Ax 2 jv < 1 criterion looked good 'by eye'. 



8 V. Rajpaul 



Table 2. Comparison of several techniques used to fit randomly-generated, noise-free binary lightcurves. 
A fitted model was deemed successful if it had Ax 2 / 1 ' < 1 and errors in all fitted parameters < f0%; 
success rates using the weaker criterion of A% 2 /v < 1 only are shown in parenthesis. 



Method Reference Percentage successful fits 

1 peak 2 peaks 3 peaks 4 peaks 



Grid search 


This work 


1(1) 





(0) 





(0) 





(0) 


Iterated simplex 


This work 


43 (98) 


36 


(93) 


26 


(78) 


16 


(50) 


Artificial neural network by itself 


Vermaak 


4(6) 





(0) 





(0) 





(0) 


Artificial neural network with amoeba 


Vermaak 


70 (96) 


70 


(82) 


62 


(72) 


68 


(76) 


Evolutionary algorithm by itself 


This work 


93 (100) 


96 


(98) 


90 


(93) 


89 


(89) 


Evolutionary algorithm with amoeba 


This work 


97 (100) 


98 


(100) 


97 


(98) 


94 


(94) 



Table 3. Comparison of the EA with a conventional algorithm for fitting randomly-generated, noisy 
binary lightcurves, including random temporal gaps in the data, and blended light. Success criteria as in 
Tableg] 



Method 


Reference 




Percentage 


successful fits 








1 peak 


2 peaks 


3 peaks 


4 peaks 


Iterated simplex 


This work 


39 (95) 


29 (84) 


13 (47) 


12 (42) 


Evolutionary algorithm with amoeba 


This work 


95 (98) 


96 (99) 


90 (97) 


87 (99) 



optimiser. Additionally, in the handful of cases where the EA (with 
or without the amoeba method) failed, allowing the algorithm to 
run a little longer (e.g. for 30 minutes, instead of the arbitrarily- 
imposed 20-minute limit) always yielded successful fits. The same 
concession applied to the iterated simplex algorithm, but the time 
required to ensure successful fits was invariably much longer, i.e. 
on the order of several hours. 

All algorithms fared better when fitting simpler (single- or 
double-peaked) lightcurves than when fitting the more complex 
ones, although for the EA this difference was very small. Indeed, it 
is quite surprising just how accurately the EA was able to home in 
on model parameters (in the case of successful fits, almost always 
to an accuracy of much better than 1%), even when the lightcurves 
had little apparent structure. 

4.4 Results: noisy, blended lightcurves 

In order to make the fitting problem more challenging, the follow- 
ing changes were made: 

(i) temporal sampling of lightcurves was randomised by 
drawing observation times from a rectangular distribution on 
(tstartj tend), thus allowing for gaps, corresponding e.g. to bad 
weather, in the data; 

(ii) Gaussian noise, as per the model defined by Eqn. [8] was 
added to all datapoints; 

(iii) the possibility of significant blending was allowed for. 

Allowing for blending (wherein the presence of unlensed light, 
along the line-of-sight to the lens, dilutes the amplification signal 
of the source) meant extending the SBLM to include a new param- 
eter, / £ (0, 1]. If blended light is present, the observed ampli- 
fication can, assuming co nstant blending, be computed as follows 
dPi Stefano & Esinll 19951) : 

A(t) = f-A s (t) + (l-f), (11) 

where As(t) is the microlensing amplification of the source (the 
amplification that appears in Eqn. [7J, and the parameter / is im- 
plicitly defined to be the ratio of the unlensed source flux, to the 



total unlensed flux (source flux, plus flux of all luminous sources 
along the line-of-sight to the lens, including flux from the lens it- 
self) in the telescope beam. Although the blending effect is rela- 
tively trivial to calculate (and in fact, a least-squares estimate for 
/ can be obtained directl y from a simple linear equation; see e.g. 
Ijaroszvnski & Maol200ll) . the presence of blended light does intro- 
duce ambiguities between competing models. 

For consistency's sake, all other aspects of the experiment 
were left unchanged. 

Table [3] gives an indication of how both the EA and the con- 
ventional algorithm fared on this somewhat more realistic problem 
(for brevity, this time only the performance of the amoeba fine- 
tuned algorithms is presented). The EA's performance remained 
excellent, with success rates generally in excess of 90%; this com- 
pares very favourably with the conventional algorithm's success 
rates of around 25%. Fig. [3] gives an example of a typical fit per- 
formed by the EA. 

Unfortunately no data were available to facilitate a direct com- 
parison (i.e. using an identical noise model, randomised tempo- 
ral sampling, and blending) with Vermaak's ANN; his conclusion, 
however, based on experiments with a very similar noise model was 
that noise has detrimental effects on the accuracy of ANN-based 
fitting, but that an ANN can at leas t be designed/(re)trained to mit- 
igate these effects (Vermaak 2007). As such, even if one assumes 
that the ANN can perform as well with the noisy lightcurves as it 
did with the noise-free ones, the EA still emerges as the far more 
accurate algorithm. 



5 DISCUSSION 

5.1 Relative fitting performance of the EA 

Though the EA is, arguably, more computationally expensive than 
the ANN (once suitably trained, the ANN can perform a fit in un- 
der a minute; then again, actually training the ANN takes much 
longer, so it is not easy to compare the computational expense of 
the EA and the ANN on an equal footing), it certainly offers a 
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search efficiency far beyond that of mere brute-force approaches, 
as evidenced by its clear outperformance of the grid search and it- 
erated simplex approaches (both of which were calibrated to have 
the same computational footprint as the EA). The computational 
expense notwithstanding - in any event, a fitting time of several 
minutes would be 'fast enough' even for modelling ongoing events 
- the EA offers a number of striking advantages over the ANN ap- 
proach insofar as lightcurve analysis is concerned: 

(i) the fitting accuracy of the EA is superior to that of the ANN; 

(ii) the algorithm itself is much simpler (it is relatively easy to 
code from scratch and to fine-tune an EA, which is certainly not the 
case with an ANN); 

(iii) the algorithm can be used straight 'out of the box', and re- 
quires no training; 

(iv) the algorithm is far more robust since it doesn't need to be 
retrained/reconfigured to cope with noise, extensions to the basic 
lightcurve model, or even entirely different models (see Section 
E6]>;and 

(v) the EA does not yield a single 'all or nothing' fit - a single 
run yields very many fitted models, and the longer the algorithm 
is allowed to run, the better the chance of pinpointing the globally 
optimal models(s). 



5.2 The importance of parallelisation 

A fitting time on the order of several minutes sounds appealing, 
but unfortunately the simple binary-lens model adopted here ne- 
glects many effects often associated with real-world microlensing 
events, including parallax, instrumental, and finite-source effects. 
Dealing with some of these effects poses no significant difficulties, 
and requires only simple extensions to the basic SBLM; in par- 
ticular, however, computing the magnification of an extended (as 
opposed to point-like) source can take around two orders of magni- 
tude longer than the corres ponding calcula t ions under th e point- 
sourc e approximation (e.g IVermaakl |2007| ; iGould 120081 ; iBozzd 
120101) . Unfortunately, finite-source effects g enerally cannot be ig- 
nored when dealing with planetary signals jVermaakll2000h . so in 
such cases the claims about the algorithm's speed are seemingly 
invalidated. To make matters worse, the cadence of modern wide- 
field surveys can be nearly two orders of magnitude higher than 
in the lightcurves simulated here, i.e. on the order of 15 minutes 
dShvartzvald & Maodl2012l) . The algorithm needn't be changed in 
any way to deal with higher cadences, though fitting times will, of 
course, increase in proportion to the increase in data ratesQ This, 
combined with having to deal with finite-source effects, would sug- 
gest a total increase in fitting times by around four orders of mag- 
nitude! 

Certainly, some performance gains can be expected from a 
meta-optimisation of the EA's control parameters (i.e. 'fine-tuning' 
the algorithm specifically for fitting microlensing lightcurves); 
more importantly, however, EAs are highly amenable to paral- 
lel implementations, and they parallelise especially well when the 
cost of evaluating candidate solutions is high (as is certainly the 
case with microlensing modelling). Indeed, multi-core and multi- 
processor computers, clusters, grids, cloud computing platforms, 
and general-purpose GPUs (with upwards of a thousand streaming 



5 Current work does, however, suggest the feasibility of using data com- 
pression to reduce full lightcurves to a smaller number of information- 
carrying datapoints, in order to accel erate analysis of binary events 
iHeavens et alj200(j ; lHundertmarkll2012ri . 



processors) are becoming increasingly prevalent in modern com- 
puting, and EAs are ideally suited to exploit such devices and plat- 
forms. 

For example, by implementing an EA on a fairly hig h-end 
(though not top of the range) GPU card, iMaitre et al.l d2009 ) man- 
aged to achieve a speed-up of roughly 100 x relative to the same 
EA running on a standard 3.6 GHz processor, and advocated 
the use of multiple GPU cards to achieve even more dramatic 
spee dups. A number of other authors have reported similar results 
(see Risco-Martn et al. 20 l j, and references contained therein): 
IPospichal. Jaros & Schwarzl bold) , for example, managed to speed 
up a GA by a factor of nearly ten thousand, again using only fairly 
modest GPU hardware! 

The possibility of such dramatic performance gains suggests 
that even when dealing with finite-source effects, it should be quite 
possible to constrain the EA's fitting times to under an hour. Porting 
existing code to GPU architectures does require quite specialised 
programming knowledge, but, given the potential gains, such an 
undertaking might well be a worthwhile investment of time and 
effort. 



5.3 Possible EA parallelisation schemes 

While EAs are not unique in being easy to parallelise (e.g. the grid 
search algorithm parallelises trivially!), EAs have the advantage of 
being amenable to a number of different parallelisation schemes - 
some straightforward, some more sophisticated - and this provides 
scope for optimising an EA's performance for a spe cific hardware 
configuration and/or problem jHaupt & Hauptll 20041) . 

The most trivial of all parallelisation schemes is to parallelise 
only the evaluation, each generation, of candidate solutions, with 
the rest of the algorithm - which will usually have a very small 
computational footprint relative to the evaluation of candidate so- 
lutions - left in serial form. Since selection occurs globally and 
within a single population, such EAs are usually referred to as 
'panmictic' or 'micro-grained'. It was such an approach that was 
adopted for the EA discussed in this paper. (In problems where the 
computational cost of evaluating candidate solutions is relatively 
inexpensive, the cost of applying evolutionary operators cannot be 
ignored and efficient parallelisation would be harder to achieve.) 

Another popular approach for parallelising EAs is to as- 
sign to each available processor an entire evolutionary popula- 
tion, and then to allow these populations to evolve more-or-less 
independently, perhaps allowing periodic 'communication' (e.g. in 
the form of migration) between the populations. These so-called 
'coarse-grained' or 'island' EAs are ideally suited for implemen- 
tation on distributed memory MIMD (multiple instructions, multi- 
ple data) machines; in fact, even with all else being equal, island 
EAs t end to outperform single population EAs dGordon & Whitlevl 
Il993l) . with premature convergence being less of an issue, and op- 
timal solutions usually being located more quickly. In this sense, 
parallelising an EA can lead not only to improved computational 
efficiency, but also to a more effective search algorithm! 

A third major class of parallel EA is the 'cellular' or 'fine- 
grained' EA, which can be thought of as being intermediate to the 
two aforementioned classes. In such EAs, the evolutionary opera- 
tors are decentralised, and each processor handles a number of very 
small evolutionary populations (perhaps containing only one or two 
individuals), each of which can interact with a number of 'nearby' 
populations. Such an implementation is a often a natural choice on 
an SIMD (single instruction, multiple data) computer. 
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Figure 3. A typical noise-free lightcurve generated via the SBLM, simu- 
lated observations of this lensing event, and the lightcurve for the model, as 
found by the EA, that best fits this (noisy) data. 



techniques, problem-specific information, etc.), even on very dif- 
ficult problems. This consideration might be a largely academic 
one in the context of modelling binary events (where bespoke algo- 
rithms that incorporate problem-specific information have already 
been developed and implemented with success, e.g. iKains et aU 
12009b . but this consideration would apply to any difficult modelling 
problem, including ones where bespoke algorithms have not yet 
been developed. 



5.5 EA vs. Monte Carlo optimisation 

It is worth mentioning that Monte Ca rlo methods are ubiquitous in 
the fi e ld of microlensing m odelling dSumi et alj|201ol : I Shin et al.l 
1201 ll ; ISkowron et aU 1201 ll) . During the course of this work, a 
Markov chain Monte Carlo (MCMC) method, incorporating an 
adaptive Metropoli s sampler and a delayed rejection mechanism 
dHaario et "all l200t3) . was deployed and used to perform fits on 
single-lens events. The MCMC method was found to be nearly two 
orders of magnitude slower than the EA, and so no in-depth tests of 
the MCMC method were carried out for binary-lens events. In any 
event, even if an EA can perform initial fits more quickly, Monte 
Carlo would remain indispensable e.g. for estimating uncertainties 



in fitted parameters found by the EA dPress et alj '2007) 



5.4 Incorporation of problem-specific information 

As already noted, none of the algorithms tested incorporated any 
information specific to the structure of the microlensing mod- 
elling problem, e.g. knowledge of lightcurve morphologies or of 
the topologies of underlying critical curves and source trajectories. 
Incorporating such prior knowledge could be used to make most 
of the algorithms, except the ANN, more efficient: that is, rather 
than letting them search 'blindly' through the full SBLM param- 
eter space, problem-specific information could be used to reduce 
the volume of the space of feasible parameters, and c ould provide 
clues as to where good solutions are likely to be found dKains et"al] 
120091) . Insofar as this work goes, the main reason microlensing- 
specific information was not incorporated was to facilitate a more 
fair comparison of the other algorithms with Vermaak's ANN, into 
which the incorporation of problem-specific information would be 
very difficult. Still, some more general arguments could be made 
in favour of the non-inclusion of such information. Incorporating 
knowledge of the SBLM and its associated lightcurves, for exam- 
ple, would increase the complexity and reduce the generality of 
the algorithms, and different information would need to be incor- 
porated if one wanted to fit, say, a triple-lens model instead of a 
binary model. Indeed, if the computational power is available for 
an algorithm to perform fits in a reasonable amount of time, the 
manual incorporation of problem-specific information would be, if 
nothing else, an unnecessary expense of human effortQ 

Indeed, we are at a stage where the computational power is 
available to drive intelligent search/optimisation algorithms that 
can rival the active modelling efforts of humans (using analytical 



6 By way of analogy, knowledge of the properties of polynomials can often 
be used to find some of the roots of certain classes of polynomial but, es- 
pecially when dealing with higher-order polynomials, it is usually far more 
straightforward and efficient to find the roots with a general root-finding al - 
gorithm than to try to apply analytical results jtCing 1996; Pre ss et al.l2007l) . 



5.6 Future work 

Although some of the merits of EAs have been extolled in this pa- 
per, this work serves merely as a proof of concept for autonomous, 
EA-based microlensing modelling: the EA's relative performance 
is promising, but it remains to be seen how it will fare when faced 
with more complex microlensing modelling problems. Indeed, the 
simple binary-lens model and simulation parameters adopted here 
(primar ily for the sake of allo wing a direct comparison with the re- 
sults of lVer maak 2003. 120071) led to the exclusion of many effects 
usually associated with real-world microlensing events, including 
finite-source, parallax, and lens orbital-motion effects. Some of the 
parameter ranges used here would need to be extended - or, bet- 
ter yet, replaced with more realistic Bayesian priors that avoid hard 
limits - to cover cases where the secondary lens is an exoplanet. Fi- 
nally, to be more representative of modern, high-cadence surveys, 
temporal sampling of simulated lightcurves would need to be in- 
creas ed by about two orders of magnitude dShvartzvald & Maozl 
l2012h - this would increase fitting times, but should also help to 
resolve some model ambiguities. 

Still, since the EA managed to move effortlessly from single- 
lens to binary-lens to noisy binary fitting problems, one can specu- 
late that fitting problems involving an extended SBLM should not 
pose major obstacles to the EA. Of course, given a more complex 
modelling problem, one would need to sacrifice some fitting accu- 
racy, and/or harness more computing power (as suggested in Sec- 
tion ^. 2t if one wanted to leave fitting times unchanged. 

An immediate extension to the present work will be an in- 
vestigation by the author of the EA's utility in modelling ongoing 
microlensing events, the hope being that it might provide a fast 
and accurate means of forecasting important features in real-world 
lightcurves. This in turn could facilitate better-informed observa- 
tions, and ultimately, more useful data. In fact, preliminary results 
from this work (using more realistic simulated lightcurves, fea- 
turing greatly expanded parameter ranges, much higher temporal 
sampling, blended light, etc.) have been promising - the EA does 
seem well-suited to the efficient location of solutions compatible 
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with existing data, and once such solutions have been found, it is 
a straightforward task to generate predictions of e.g. the time of a 
caustic crossing. Successful EA-based forecasts have been already 
been obtained - allowing fitting times on the order of an hour or 
two, though with the general algorithm otherwise set up in much the 
same way as described in this work - on a small selection of real 
events (using truncated versions of their completed lightcurves), 
including MOA-201 1-BLG-197/OGLE-201 1-BLG-0265Q lending 
some credence to the general feasibility of an EA-based fitting ap- 
proach. However, it remains to quantify rigorously the extent to 
which intrinsic model degeneracy and nonlinearity hinder reliable 
forecasts, both with and without the incorporation into the fore- 
casting of problem-specific information into the modelling. Results 
from this work are to appear in a dissertation (currently in prep.) by 
the author. 



6 CONCLUSIONS 

A novel evolutionary algorithm was introduced and was shown to 
be well suited to fitting binary-lens microlensing lightcurves. De- 
spite the difficulty of this fitting problem, the algorithm yielded ex- 
cellent fitting accuracy whilst maintaining a relatively modest com- 
putational footprint, and offering a number of other desirable prop- 
erties (robustness, versatility, trivial parallelisation, etc.). As such, 
this work provided proof of concept for the use of an evolution- 
ary algorithm as the basis for real-time, autonomous modelling of 
microlensing events. It was noted, however, that further work is re- 
quired to investigate how the algorithm will fare when faced with 
more complex and realistic microlensing modelling problems. 

Indeed, this work could be extended in a number of ways, in 
particular by relaxing some of the assumptions made here, and also 
by investigating how an evolutionary algorithm will fare at trying to 
predict features in the lightcurves of ongoing microlensing events. 
Early investigations in this regard have been promising. 

Though evolutionary algorithms can by no means be the 'fi- 
nal word' in microlensing modelling, a simple evolutionary algo- 
rithm such as the one presented here could serve as a useful addi- 
tion to the toolbox of a microlensing modeller, or indeed, of any 
astronomer working with difficult model-fitting problems. 



ACKNOWLEDGMENTS 

This work was supported financially by the University of Cape 
Town and the National Research Foundation. The author wishes to 
thank the anonymous referee for his/her helpful comments, con- 
structive criticism, and patience; similarly, the author expresses 
gratitude to Dr John Menzies and Dr Ian Stewart. 



REFERENCES 

Asada H., 2002, A&A, 390, LI 1 
Beaulieu J., et al., 2006, Nature, 439, 437 

Bennett D. R, 2008, Detection of Extrasolar Planets by Gravita- 
tional Microlensing. Springer, pp 47-88 
Bennett D. P., 2010, ApJ, 716, 1408 
Bennett D. P., Rhie S. H., 1996, ApJ, 472, 660 
Bourassa R. R., Kantowski R., Norton T. D., 1973, ApJ, 185, 747 

7 See http://www.stelab.nagoya-u.ac.jp/ %)7Esumi/anomaly/201 1/. 



Bozza V., 2010, MNRAS, 408, 2188 

Burnham K. P., Anderson D. R., 2002, Model selection and mul- 
timodel inference: a practical information- theoretic approach. 
Springer 

Cassan A., et al., 2012, Nature, 481, 167 
Charbonneau P., 1995, ApJS, 101, 309 

Charbonneau P., 2002, Technical report, An Introduction to Ge- 
netic Algorithms for Numerical Optimization. National Center 
for Atmospheric Research 

Di Stefano R., Esin A. A., 1995, ApJ, 448, LI 

Dominik M., 1999, A&A, 341, 943 

Dominik M., 2008, in Manchester Microlensing Conf., Modelling 

Microlensing Events 
Dominik M., 2009, MNRAS, 393, 816 

Eiben A. E., Aarts E. H. L., Hee K. M. v., 1991, in Proc. 1st 
Workshop on Parallel Problem Solving from Nature, PPSN I, 
Global convergence of genetic algorithms: A markov chain anal- 
ysis. Springer- Verlag, London, UK, pp 4-12 

Eiben A. E., Schippers C. A., 1998, Fundam. Inf., 35, 35 

Gaudi B. S., 2010, preprint (astro-ph/1002.0332v2) 

Gaudi B. S., et al., 2008, Science, 319, 927 

Gaudi B. S„ Naber R. M., Sackett P. D., 1998, ApJ, 502, L33 

Gordon V. S., Whitley D., 1993, in Forrest S., ed., Proc. 5th Int. 
Conf. on Genetic Algorithms, Serial and parallel genetic algo- 
rithms and function optimizers. Morgan Kaufmann, San Mateo, 
CA, USA, pp 177-183 

Gould A., 2008, ApJ, 681, 1593 

Gregory P. C, 2005, Bayesian Logical Data Analysis for the 
Physical Sciences: A Comparative Approach with 'Mafhemat- 
ica' Support. Cambridge University Press 

Haario H., Laine M., Mira A., Saksman E., 2006, Statistics and 
Computing, 16, 339 

Haupt R. L., Haupt S. E., 2004, Practical Genetic Algorithms. 
Wiley-Interscience 

Heavens A. E, Jimenez R., Lahav O., 2000, MNRAS, 317, 965 

Holland J. H, 1975, Adaptation in Natural and Artificial Systems. 
The University of Michigan Press, Ann Arbor 

Hundertmark M., 2012, Characterising the Information Con- 
tent of Microlensing Light Curves, presented at 16th Int. 
Conf. on Gravitational Microlensing (available online at 
http://www.ipac.caltech.edu/wfir2012) 

Jaroszyfiski M., Mao S., 2001, MNRAS, 325, 1546 

Kains N., et al., 2009, Monthly Notices of the Royal Astronomical 
Society, 395, 787 

King R. B., 1996, Beyond the Quartic Equation. Modern 
Birkhauser Classics, Springer 

Kramer O., 2010, Evolutionary Intelligence, 3, 51 

Kubas D., 2005, PhD thesis, University of Potsdam 

Lagarias J. C, Reeds J. A., Wright M. H., Wright P. E., 1998, 
SIAM Journal of Optimization, 9, 1 12 

Maitre O, Baumes L. A., Lachiche N., Corma A., Collet P., 2009, 
in Proc. 1 1th Annu. Conf. on Genetic and Evolutionary Comput., 
GECCO '09, Coarse grain parallelization of evolutionary algo- 
rithms on GPGPU cards with EASEA. ACM, New York, NY, 
USA, pp 1403-1410 

Mao S., Di Stefano R„ 1995, ApJ, 440, 22 

Mao S., Paczynski B., 1991, ApJ, 374, L37 

Michalewicz Z., 1996, Genetic Algoritms + Data Structures = 
Evolution Programs. Springer, Berlin, AL 

Michalewicz Z., Fogel D. B., 2000, How to Solve It: Modern 
Heuristics, 2nd edn. Springer 

Miller B. L., Goldberg D. E., 1995, Complex Systems, 9, 193 



12 V. Rajpaul 



More J. J., Gai'bow B. S., Hillstrom K. E., 1981, ACM Trans. 

Math. Softw., 7, 17 
Nelder J. A., Mead R., 1965, The Computer Journal, 7, 308 
Night C, Di Stefano R., Schwamb M., 2008, ApJ, 686, 785 
Pospichal R, Jaros J., Schwarz J., 2010, in Chio D., et al. eds, 

Lecture Notes in Computer Science, Vol. 6024, Applications of 

Evolutionary Computation. Springer Berlin/Heidelberg, pp 442- 

451 

Press W. H., Teukolsky S. A., Vetterling W. T, Flannery B. P., 
2007, Numerical Recipes: The Art of Scientific Computing, 3rd 
edn. Cambridge University Press, New York 

Quanz S. P., Lafreniere D., Meyer M. R., Reggiani M. M., Buenzli 
E., 2012, A&A, 541, A133 

Rajpaul V., 201 1, in Basson I., Botha A. E., eds, Proc. SAIP 201 1, 
Genetic algorithms in astronomy and astrophysics. UNISA, Pre- 
toria, pp 519-524 

Ralston A., Rabinowitz P., 2001, A First Course in Numerical 
Analysis, 2nd revised edn. Dover Publications 

Risco-Martn J., Lanchares J., Coello-Coello C, 2012, Soft Com- 
puting - A Fusion of Foundations, Methodologies and Applica- 
tions, 16, 185 

Shin I.-G., et al., 201 1, ApJ, 735, 85 

Shvartzvald Y., Maoz D., 2012, Monthly Notices of the Royal As- 
tronomical Society, 419, 3631 
Skowron J., et al., 2011, ApJ, 738, 87 
Sumi T, et al., 2010, ApJ, 710, 1641 
Sumi T, et al., 201 1, Nature, 473, 349 

Thierens D., Goldberg D., 1994, in Davidor Y, Schwefel H.-P, 
Mnner R., eds, Lecture Notes in Computer Science, Vol. 866, 
Parallel Problem Solving from Nature PPSN III. Springer 
Berlin/Heidelberg, pp 119-129 

Tsutsui S., Ghosh A., Come D., Fujimoto Y, 1997, in Proc. 7th 
ICG A., A real coded genetic algorithm with an explorer and an 
exploiter populations. Morgan Kaufmann, pp 238-245 

Udalski A., 2003, Acta Astronomica, 53, 291 

Vermaak P., 2000, MNRAS, 319, 1011 

Vermaak P., 2003, MNRAS, 344, 651 

Vermaak P., 2007, PhD thesis, University of Cape Town 

Wambsganss J., 2011, Nature, 473, 289 

Wehrens R., Buydens L. M. C, 1998, Trends in Analytical Chem- 
istry, 17, 193 
Witt H. J., 1990, A&A, 236, 311 



APPENDIX A: MECHANICS OF THE NEW ALGORITHM 

This appendix sketches the 'nuts and bolts' of the new evolution- 
ary algorithm introduced in Section [33] albeit without providing 
a detailed rationale for its various features An in-depth discus- 
sion/analysis of the algorithm is to be included in a dissertation by 
the author (currently in prep.); suffice it to say, however, that most 
aspects of the algorithm's design were informed by extensive em- 
pirical testing, as well as theoretical considerations. 

The central dynamical quantity used by the algorithm, as with 



8 Original MATLAB code for the algorithm is available from the author; 
the code itself is not presented here because, though very compact, it would 
make little sense without detailed documentation to explain the various ex- 
ternal functions called by the code, MATLAB syntax and operators, etc. In 
any event, the description of the algorithm given here is sufficient to con- 
struct, in any language, a working version of the algorithm. 



most EAs, is an (encoded) population matrix, P: 

W = W(t) = \p i , j ) NmXNva . (Al) 

¥(t) is constructed so that pij £ [0, 1] encodes the value of the 
j th physical parameter associated with the i individual (trial solu- 
tion) in the Apop-member evolutionary population, denoted a»j, at 
generation f 6 {1,2,..., Agen}. By default, a population size of 
A pop = 1000 is assumed; the number of fitting parameters, Apar, 
will be fixed by the problem at hand (e.g. A pal - = 8 for the fitting 
experiments in Section|4j4). 

P may be thought of as representing an encoded ensemble of 
trial solutions, with a row vector p^* representing a single trial so- 
lution, and an individual matrix element (pij) representing a small 
'chunk' or component of a trial solution. The relationship between 
P(i) and its decoded/physical counterpart, denoted A(t), can be 
represented thus: 

decoding 

\Pi,j] =!" ( A= [aij]; (A2) 

decoding 

the actual encoding/decoding process is explained in Section lA2l 
The basic evolutionary sequence used by the EA is as follows: 

(i) Initialise P(f = 1). 

(ii) Iterate t = 1,2,..., Ag en ; for each t: 

(a) decode P(t) — > A(t), and pass A(t) to an external fitness 
function; 

(b) rank P(i) based on the fitness returned by the external 
function; 

(c) select solutions for reproduction; 

(d) produce offspring and insert them into P(t); 

(e) apply jump and creep mutations to P(i), and 

(f) check whether the evolution has stagnated, and if so, make 
appropriate adjustments to the algorithm. 

(iii) Collate and output results, as required. 

The total number of evolutionary generations, A gen , can be used to 
control how long the algorithm runs: larger values of Ag en will lead 
to longer run-times, though with more solutions being evaluated. 
(Alternatively, the algorithm can be allowed to run for an arbitrary 
number of generations, until some user-defined convergence crite- 
rion is achieved.) 

The algorithm's basic working scheme is conceptually 
straightforward, and most of its features overlap broadly with those 
of the canonical genetic algorithm; many of its finer details and fea- 
tures - some unique to the algorithm, some inspired by other EAs 
- are, however, non-canonical. 

Each step in the aforementioned working scheme is described 
in the subsections below. 

Al Initialisation 

Initialising the population matrix entails simply assigning random 
variates with a standard uniform distribution, U(0, 1), to half of 
the elements in P. The remaining elements are then assigned the 
complements of the values already assigned. That is to say, if one 
element in the population matrix is assigned the value X £ [0, 1], 
another random element will be assigned the value (1— A) £ [0,1]. 

The initialisation scheme ensures a uniform and unbiased ini- 
tial sampling of the entire (encoded) parameter space. (It is possible 
to change the initialisation scheme to incorporate prior information 
via a bias in the initial population, although in practice, a more 
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robust and theoretically-sound approach is to incorporate prior in- 
formation into the external fitness function.) 



again during generation t, and is simply recomputed during gener- 
ation t + 1. 



A2 Encoding/decoding 

The following bicontinuous linear transformation is used to relate 
a physical parameter value, ai j, to its encoded counterpart, pi j £ 
[0,1]: 



a i,j = a 3 + (ft - a j)Vi,j 



(A3) 



where it assumed that the physical parameter aj can be restricted to 
some domain fij = [a.j,fij], with otj and /3j known. For example, 
if aj is an angle, a typical domain might be Qj = [0, 2n). Even 
if aj is in principle unbounded, physical considerations and/or 
prior knowledge associated with the problem at hand should al- 
ways allow bounds to be placed on the parameter. For example, 
if aj represents a binary-lens mass ratio, a safe choice might be 
ttj = [l(T 6 ,l],say. 

The interpretation of the encoded value Pi.j, as implicitly de- 
fined in Eqn. |A3| is straightforward: pij represents the fractional 
position of a,,j along its 'physical' domain [aj, /3j]. It should also 
be clear that the algorithm works with real-valued representations 
of parameters (pi.j £ M), rather than discretising the parameter 
domains in any way, as would've been the case with e.g. a binary- 
coded genetic algorithm. 

The encoding scheme amounts to working with all parameters 
normalised to the unit interval. The normalisation is largely a mat- 
ter of convenience, and means that all parameters can be treated 
on an 'equal footing' when applying evolutionary operators (with- 
out normalisation, a different mutation operator would be required 
for each different parameter, to ensure mutations don't produce pa- 
rameter values outside the allowable domains; with normalisation, 
a single mutation operator can be applied to P as whole, making for 
simpler code). As for the normalisation specifically to the unit in- 
terval: the unit interval corresponds identically to the support of the 
standard uniform distribution - which serves as a basis for sampling 
from most other probability distributions - a fact that simplifies the 
coding of the stochastic aspects of the algorithm. Indeed, it is pos- 
sible to forego parameter encoding altogether, though this would 
come at the cost of more complex code for handling mutation op- 
erators, crossover operators, etc. 



A3 Fitness evaluation and ranking 

Fitness evaluation entails passing the decoded trial solutions in 
A(t) to some external, problem-specific function that assigns a 
well-defined figure of merit, or fitness, to each solution. In the case 
of data modelling, this figure of merit will usually be related to 
a normalised fitting error - e.g. a \ 2 statistic (more specifically, 
something like 1 — x 2 or 1/x 2 so that, as per convention, better so- 
lutions can be associated with a higher fitness). An important point 
is that fitness-ranking can actually be performed according to arbi- 
trary Bayesian priors and likelihood functions, say - in fact, the al- 
gorithm itself makes absolutely no assumptions about the statistical 
paradigm (e.g. Bayesian vs. frequentist) in which data modelling is 
taking place. 

Once the solutions have been evaluated, the rows of the en- 
coded population matrix W(t) are sorted so that the fittest solution 
occupies row 1, the next fittest solution row 2, and so on, with the 



A4 Selection 

Once the quality of each trial solution has been ascertained, this 
information is used to select the solutions that will reproduce, i.e. 
the solutions that will have their 'genetic material' (encoded chunks 
of solutions) propagated to the next generation. 

Reproductive pairings are picked via a variant of the well- 
know n 'tournament selection' mechanism dMiller & Goldberg! 

The scheme used to choose any parent is as follows: 
draw a random sample of N t0 am £ N* values from the set 
{1,2,..., A p0 p}. Find the minimum value in that sample, and let 
this number be the rank of a parent chosen for reproduction. This 
scheme is repeated as many times as is necessary to produce the 
desired number of reproductive pairings. By default, a value of 
N toum = [Apop/25] is used. 

The basic idea behind tournament-style selection is that it cor- 
responds to running a number of small tournaments or 'fights' be- 
tween solutions, with the 'victor' of an individual tournament, viz. 
the fittest (best-ranked) solution in that particular tournament, be- 
ing allowed to reproduce. The larger the tournament, the higher the 
selection pressure, i.e. the less likely that poor solutions will be 
chosen for reproduction. 

Note that, because the selection scheme is rank-based rather 
than fitness-based, the exact choice and normalisation of the fitness 
function (e.g. 1 — \ 2 vs. 1 /\ 2 vs. 1 — \/~X 2 ~) is largely irrelevant 
to the algorithm, so that performance will generally not hinge on a 
'clever' choice of a fitness function. More precisely, the algorithm's 
performance will be identical when using any two fitness functions 
that can be related via a monotone (order-preserving) transforma- 
tion. 



A5 Reproduction 

Offspring are produced by combining the genetic material of parent 
solutions. For simplicity it is assumed that exactly two parents are 
involved in any reproductive pairing, and that each such pairing 
gives rise to two offspring. Assuming the parent solutions chosen 
for a particular pairing arepi,, andp,,*, the algorithm will produce 
two offspring via the following two convex combinations of parent 
solutions: 



and 



F+ Pj , t ■ (I-F), 



Pi,* -F + pi,, • (I-F), 



(A4) 



(A5) 



worst solution in row N„ 



is not sorted as it is not be used 



where F = [fij] is an A par x N piu - diagonal 'weighting' matrix, with 
all elements constrained to the unit interval, and I is the N par x A par 
identity matrix. 

The actual choice of F determines the characteristics of the 
reproduction. For example, if /j,, = ~ V i, the offspring will sim- 
ply be arithmetic averages of the parent solutions, as may easily 
be confirmed via Eans. lA"4l and lA51 on the other hand, if each 
is assigned randomly, then the offspring will be correspondingly 
random interpolants of the parents solutions. 

The algorithm assigns F as follows. For each reproductive 
pairing, with equal probability, assign diagonal elements of the 
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weighting matrix W according to any one of the following three dis- 
tributions: 

' / M ~W(0,1), or 

< ~Bin(l,i),or (A6) 
hi =lVi. 

The distribution used to make the assignment is chosen randomly 
for each reproductive pairing. In the first case, the genes of the off- 
spring will be random interpolants of the parental genes; in the 
second case the offspring's genes will be randomly drawn from the 
combined gene-pool of the parents, but no interpolation will oc- 
cur; and the final case corresponds to asexual reproduction wherein 
offspring are identical copies of their parents (which is useful for 
giving the upcoming mutation operators multiple opportunities to 
fine-tune a solution that is already very good). Both the second and 
third schemes can be thought of as limiting cases of the first, more 
general scheme. 

All solutions in P(t), save for the fittest one, are replaced by 
the newly-created offspring. This non-replacement of the fittest in- 
dividual in P(t) is referred to as 'elitism' in the literature (e.g. 
iMichalewicz & Fogell2000l) . and is a standard feature of most EAs. 
A practical implication of elitism is that the fitness of the fittest in- 
dividual in the population will increase monotonically with t (un- 
less of course the evolutionary sequence is restarted, as discussed 
in Section lATl - even in this case, however, the evolution history of 
the population will be saved, so in a global sense, the fitness of the 
best solution known will still increase monotonically with t). 

A6 Creep and jump mutation 

All newly-created offspring are subject to fine-grained mutations, 
also known as 'creep mutations'. These mutations randomly in- 
crease or decrease an encoded value by a randomly-chosen step 
with a log-uniform distribution: 

^ " ' "' • 1 pI = <>. + (i - Pi,) ■ x j (A7) 

where the actual assignment (pfj or is chosen with equal 
probability, \nX ~ W(ln£,lnl), and e is the machine epsilon. 
Note that the creep mutations are fully compatible with the algo- 
rithm's parameter encoding scheme because V X, p?v £ [0, 1]. 

In addition to the creep mutations, the offspring are exposed to 
the possibility of coarse-grained or 'jump' mutations. With a jump 
mutation, an element of the population matrix is 'flipped' to a com- 
pletely random value on the unit interval: 

Pij ->X~f/(0,l). (A8) 

A jump mutation is assumed to happen with some small proba- 
bility < pmut <C 1, so that in a given generation, on average 
Pmut x iVpop x iVpar elements of P(t) are subject to a jump mutation. 
By default, p lmlt = 1% is used. (The mutation operators were de- 
signed to strike a balance between adequate global exploration and 
efficient local optimisation, and their design was informed both by 
very extensive empirical testing as well as theoretical considera- 
tions.) 

A7 Stagnation checks 

If the evolution is deemed to have stagnated - as determined by 
any criterion, the default one being the fitness of the fittest individ- 



ual in P(t) not having improved by more than 1% over the past 10 
generations - the jump-mutation probability will be boosted (in- 
creased by 50%); conversely, if the evolution has not stagnated, 
the jump-mutation probability will be throttled back (decreased by 
50%). This autonomous boosting/throttling back of mutation prob- 
abilities, in response to the rate of improvement of solution fitness, 
serves to shift emphasis between global exploration (as mediated 
by the jump mutation operators) and local optimisation (as medi- 
ated, in part at least, by the more conservative creep mutation oper- 
ators). 

As a further step, if repeated boosts to the jump-mutation 
probability do not have the desired effect, i.e. stagnation flags are 
raised repeatedly, P(t) will be saved to memory, and the population 
will be reinitialised as was done at t — 1, so that P(t + 1) is a fresh 
evolutionary population, with no dependence on P(t). 

A8 Outputs 

By default, the algorithm outputs all dynamical quantities (popula- 
tion matrices, associated fitness vectors, mutation rates, etc.) com- 
puted for all t. The outputs from the algorithm can, however, be 
tailored, with minimal effort, to suit one's needs: in simple prob- 
lems, one might only be interested in a single 'best solution' (and 
the quality of that solution), whereas with complex, multimodal 
problems, one might wish to obtain A(t) for all t, and the asso- 
ciated fitness vectors, e.g. x 2 (A), in order to construct confidence 
intervals for model parameters, say. 



